首页> 外文OA文献 >A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis
【2h】

A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis

机译:统计参数语音合成中后滤波的深层生成结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The generated speech of hidden Markov model (HMM)-based statistical parametric speech synthesis still sounds “muffled”. One cause of this degradation in speech quality may be the loss of fine spectral structures. In this paper, we propose to use a deep generative architecture, a deep neural network (DNN) generatively trained, as a postfilter. The network models the conditional probability of the spectrum of natural speech given that of synthetic speech to compensate for such gap between synthetic and natural speech. The proposed probabilistic postfilter is generatively trained by cascading two restricted Boltzmann machines (RBMs) or deep belief networks (DBNs) with one bidirectional associative memory (BAM). We devised two types of DNN postfilters: one operating in the mel-cepstral domain and the other in the higher dimensional spectral domain. We compare these two new data-driven postfilters with other types of postfilters that are currently used in speech synthesis: a fixed mel-cepstral based postfilter, the global variance based parameter generation, and the modulation spectrum-based enhancement. Subjective evaluations using the synthetic voices of a male and female speaker confirmed that the proposed DNN-based postfilter in the spectral domain significantly improved the segmental quality of synthetic speech compared to that with conventional methods.
机译:基于隐马尔可夫模型(HMM)的统计参数语音合成所生成的语音仍然听起来“被遮住了”。语音质量下降的一个原因可能是精细频谱结构的损失。在本文中,我们建议使用经过深度训练的深度生成架构作为后过滤器。给定合成语音的条件,该网络对自然语音频谱的条件概率进行建模,以补偿合成语音和自然语音之间的这种差距。通过将两个受限的Boltzmann机器(RBM)或具有一个双向联想记忆(BAM)的深度置信网络(DBN)级联,可以对所提出的概率后滤波器进行生成训练。我们设计了两种类型的DNN后置滤波器:一种在mel倒谱域中运行,另一种在高维频谱域中运行。我们将这两个新的数据驱动的后置滤波器与语音合成中当前使用的其他类型的后置滤波器进行比较:基于固定梅尔倒谱的后置滤波器,基于全局方差的参数生成以及基于调制频谱的增强。使用男性和女性讲话者的合成语音进行的主观评估证实,与传统方法相比,拟议的基于DNN的后置滤波器在频谱域中显着提高了合成语音的分段质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号